attention to the permissions, do not use the root user mv end, do not change permissions, which will also cause unreadable problems.
I myself tested the environment cdh5.0.2 hadoop2.3
Data.dir/hdp2/dfs/data, add a new path,/HDP2/DFS/DATA2
Because the Data2 directory is empty, if startup Datanode initializes the directory, such as creating a version file.
I created the data pool and other direct
.# This option tells Gmond to use a source address# That's resolves to the machine ' s hostname. Without# This, the metrics could appear to come from any# interface and the DNS names associated with# those IPs would be used to create the rrds.#mcast_join = 239.2.11.71Host = HDP3 (practice proves that this can only be, not localhost)Port = 8649TTL = 1}/* You can specify as many udp_recv_channels as. */Udp_recv_channel {#mcast_join = 239.2.11.71Port = 8649#bind = 239.2.11.71#retry_bind = True# Siz
disconnection phenomenon, the reason for the initial suspicion of network problems, but want to consider the optimization scheme from the ganglia itself.①gmetad.confHDP1:Data_source "Zhj" localhostGridname "ZHJ"HDP2:Data_source "Zhj" Hdp1Gridname "ZHJ"②gmond.confHDP1:Cluster {Name = "Zhj"Owner = "Unspecified"Latlong = "Unspecified"url = "Unspecified"}Udp_send_channel {#bind_hostname = yes # highly recommended, soon to is default.# This option tells G
wait until the end of the map to start, not efficient use of network bandwidth2, typically a SQL will be parsed into multiple Mr Job,hadoop each job output is directly written HDFs, poor performance3, every job to start a task, spend a lot of time, can't do real-time4, the SQL functions performed by map,shuffle and reduce are different when SQL is converted to a mapreduce job. Then there is the need for map->mapreduce or mapreduce->reduce. This reduces the number of write HDFs, which can improv
HDP (Hortonworks Data Platform) is a 100% open source Hadoop release from Hortworks, with yarn as its architecture center, including pig, Hive, Phoniex, HBase, Storm, A number of components such as Spark, in the latest version 2.4, monitor UI implementations with Grafana integration.Installation process:
Cluster planning
Package Download: (HDP2.4 installation package is too large, recommended for offline installation )
HDP installation Deployment
Cluster Planning:
192.168
ApplicationMaster to communicate with NodeManager.The above two types of iner may be on any node, and their locations are generally random, that is, the ApplicationMaster may run on the same node with the tasks it manages.Container is one of the most important concepts in YARN. It is important to understand the resource model of YARN.Note: For example, map/reduce tasks run in the iner, so the mapreduce mentioned above. map (reduce ). memory. the mb size is greater than that of mapreduce. map (r
/2.4.2.0-258/hive/lib (execute the above command again, modify the red label machine name update file to HDP2,HDP3)
Modify the Hive-site.xml configuration file in the Ambari management interface, hive--and advanced--and custom Hive-site, select " Add Property ... ", Pop-up box: Key input:Hive.aux.jars.path, value is the /usr/hdp/2.4.2.0-258/hive/lib/guava-14.0.1.jar,/usr/hdp/2.4.2.0-258/hive/zookeeper-3.4.6.2.4.2.0-258.jar,/usr/ hdp/2.4.2.0-258/hi
Although the Gmetad can be multi-layered, but the layer Gmetad all need to open gweb, or is very troublesome. If just worry about a gmetad unsafe, can be made into Gmetad high availability, but I do not know whether to think of Hadoop ha as automatic failover approach.Resource arrangement:Hdp1:gmetad, Gmond, GwebHdp2:gmetad, Gmond, GwebHdp3:gmondPurpose of configuration:HDP1 and HDP2 are Gmetad, gweb high availability, each node's gweb can show the en
CentOS 6.4 + Hadoop2.2.0 Spark pseudo-distributed Installation
Hadoop is a stable version of 2.2.0.Spark version: spark-0.9.1-bin-hadoop2 http://spark.apache.org/downloads.htmlSpark has three versions:
For Hadoop 1 (HDP1, CDH3): find an Apache mirror or direct file downloadFor CDH4: find an Apache mirror or direct file downloadFor Hadoop 2 (HDP2, CDH5): find an Apache mirror or direct file downloadMy hadoop version is hadoop2.2.0, so the download is f
1. Environment configuration
This cluster has three nodes
Master:hpd1
Slave:hdp2,hdp3
Os:centos 6.5
hadoop:2.2.0
2. Download the installation package
HBase 0.98.0 Download Address: http://mirror.bit.edu.cn/apache/hbase/hbase-0.98.0/
3. Unzip the installation to a local directory
$tar-ZXVF hbase-0.98.0-hadoop2-bin.tar.gz
Configuring Hbase_home into environment variables
4. Configuration
Three files need to be modified: hbase-env.sh hbase-site.xml regionservers
Modify Hbase-env.sh
Export Java_home
of the JVM as the actual memory The memory configuration of map and reduce also has this problem, example configuration:Mapred-site.xmlset mapreduce.map.memory.mb=1024;set mapreduce.map.java.opts=-xmx819m;set mapreduce.reduce.memory.mb=2048;set mapreduce.reduce.java.opts=-xmx1638m;Yarn-site.xmlset yarn.nodemanager.resource.memory-mb=2048;set yarn.app.mapreduce.am.command-opts=-xmx1638m;This article specifically explains the cause of the problem and the recommended configurationhttp://docs.horto
testing process.
The following are the configuration suggestions provided by hortonworks:
Http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_installing_manually_book/content/rpm-chap1-11.html4.1 Memory Allocation
Reserved Memory = Reserved for stack memory + Reserved for HBase Memory (If HBase is on the same node)
The total system memory is 126 GB, and the reserved memory is 24 GB for the operating system. If Hbase exists, the reserved memory
a "pro-son" Spark. There are some differences in support, but basically the interfaces that are often used are supported.Thanks to its strong performance in data science, the Python language fans are all over the world. Now it's time to meet the powerful distributed memory computing framework Spark, two areas of the strong come together. Nature can touch more powerful sparks (spark translates into Sparks), so Pyspark is the protagonist of this section.In the Hadoop release, both CDH5 and
Preparing Resources
JDKDownload AddressHttp://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
MysqlDownload Addresshttps://dev.mysql.com/downloads/mysql/
Ambari HDPYou can adjust the version number according to the installed versionAmbari 2.2.2Http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.0.0/HDP-2.4.0.0-centos7-rpm.tar.gzHDP 2.4.2Http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0/ambari-2.2.2.0-centos7.tar.gzHdp-utils 1.1.0Ht
support a variety of Hadoop platforms, such as starting with the 0.8.1 version to support Hadoop 1 (HDP1, CDH3), CDH4, Hadoop 2 (HDP2, CDH5), respectively. At present Cloudera Company's CDH5 in the CM installation, you can directly select the Spark service to install.
Currently the latest version of Spark is 1.3.0, this article in version 1.3.0, to see how to implement the spark single-machine pseudo-distributed and distributed cluster installation.
8.1 SOLR Installation and configuration
1. Access to SOLR resources
CD $HOME
git clone https://github.com/apache/incubator-ranger.git
2. Run the following command to install SOLR:
Yum Install lucidworks-hdpsearch
Note: SOLR will be installed to/OPT/LUCIDWORKS-HDPSEARCH/SOLR
3. Modify the configuration file:
VI install.properties
java_home=/usr/local/java/jdk1.8.0_91
solr_install_folder=/opt/ LUCIDWORKS-HDPSEARCH/SOLR
Note: The main modification of the following two properties, the oth
Contact Us
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.